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Abstract: Financial econometrics has become an increasingly popular re- 
search field. In this paper we review a few parametric and nonparametric 
models and methods used in this area. After introducing several widely 
used continuous-time and discrete-time models, we study in detail depen- 
dence structures of discrete samples, including Markovian property, hid- 
den Markovian structure, contaminated observations, and random samples. 
We then discuss several popular parametric and nonparametric estima- 
tion methods. To avoid model mis-specification, model validation plays a 
key role in financial modeling. We discuss several model validation tech- 
niques, including pseudo-likelihood ratio test, nonparametric curve regres- 
sion based test, residuals based test, generalized likelihood ratio test, si- 
multaneous confidence band construction, and density based test. Finally, 
we briefly touch on tools for studying large sample properties. 
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1. Introduction 

Over the past few decades, financial econometrics has become an increasingly 
popular research field among the economics, finance, statistics and probabil- 
ity communities, and such a trend will undoubtedly continue. We refer to the 
books by Franke et al. [84], Hull [105] and Neftci [138] for an elementary intro- 
duction to mathematical finance; Duffie [GO] for asset pricing theory; Steele [150], 
Karatzas and Shreve [116] and Karlin and Taylor [117] for extensive treatments 
of stochastic calculus and martingales; Fan and Yao [77] and Li and Racine 
[127] for nonparametric methods in time series; Gao [87] for semi-parametric 
methods in econometrics; and Tsay [155] for an excellent exposition of financial 
time series analysis among others. In this survey paper, we provide a selective 
overview of some popular parametric and nonparametric models and methods 
in financial econometrics. 

One of the main objectives of financial econometrics is to understand and 
model the evolving dynamics behind the financial markets. To model the price 
dynamics of assets that are subject to uncertainties, various continuous- time 
models in the form of stochastic differential equations and discrete-time series 
models have been proposed with the hope that they could provide a reasonable 
approximation to the true data-generating dynamics. In Sections 2 and 3, we 



Z. Zhao/Nonparametric methods in financial econometrics 



3 



review some popular continuous-time and discrete-time models, respectively. In 
practice, since only discrete observations are available, dependence structures 
of such discrete observations are discussed in Section 4. 

Given a model, a natural problem is to estimate the unknown quantities of 
the model based on discrete observations. Section 5 reviews various parameter 
estimation methods when we have sufficient prior knowledge that the model 
has a parametric form with unknown parameters. Nonparametric models can 
reduce modeling bias by imposing no specific model structure other than certain 
smoothness assumptions, and therefore they are particularly useful when we 
have little information or we want to be reflexible about the underlying model. 
Section 6 gives a brief account of some useful nonparametric methods. Since the 
payoff for derivatives depends critically on the price process of the underlying 
security, it is very important that the price model of the underlying security be 
correctly specified. Such model validation problems are addressed in Section 7. 
Section 8 contains some useful tools for the study of large sample properties of 
parametric and nonparametric estimates. 

Fan [75] gives an excellent overview of nonparametric methods in financial 
econometrics. The present paper adds new material that was not covered by Fan 
[75], including jump diffusion models, stochastic volatility models, discrete-time 
models, dependence structure of discrete samples, more detailed discussion on 
parametric and nonparametric methods and model validations, and some tools 
for studying large sample properties. 

2. Continuous-time models 

2.1. Continuous-time diffusion models 

A European call option gives the holder the right to buy the underlying asset 
St, at the expiration date, or maturity, T for a certain strike price K, but the 
holder docs not have to exercise that right. Therefore, the payoff for a European 
option written on St is max(S'T — -ftT, 0), and this payoff depends critically on the 
behavior of the underlying asset 5*4. For an introduction to financial derivatives, 
see Hull [f05]. 

As a milestone of quantitative finance. Black and Scholes [-V.)] assume the 
following model for St to derive their celebrated pricing formula for European 
call options: 



where {Wt}t>o is a standard Brownian motion, and /i and a are the drift and 
diffusion coefficients, respectively. By Ito's Lemma, the solution of (2.1) is the 
geometric Brownian motion (GBM) 



dSt = luStdt + aStdWt 



(2.1) 



St = exp[{^l~ay2)t + aWt]. 



(2.2) 



As the simplest model for modeling stock prices, GBM is still widely used in 
the modern financial community. 
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Under the "risk-ncutral" world, we would expect that the interest discounted 
payoff with the discount factor exp(— rtdt) for compound interest rt should 
be a martingale and hence all stocks earn the same rate as the risk-free rates rt. 
In the early literature of finance, interest rates are considered to be constant; 
see [39] and [135]. The latter assumption works reasonably well if we consider 
only a short period of time during which interest rates remain approximately 
the same, while it causes important discrepancies over a long time span. For 
example, interest rates on one-year U.S. Treasury bills ranged from as high as 
15% in the early 1980's to as low as less than 1% in 2003. Interest rates are 
not tradable assets, while derivatives (for example, interest rate swaps, futures, 
bond options) written on them are. In fact, the interest rate derivatives market 
is the largest derivatives market with an average daily turnover of about $ 60 
trillion dollars; see [SI]. 

To price interest rate derivatives, Vasicek [I'lii] proposes the following model 
for interest rates rj, 

drt=Pia-rt)dt + adWt, a > 0, /? > 0. (2.3) 

The quantity a determines the long-run average interest rate. If rt > a, then 
P{a — rt) < pulls interest rates downward; while when rt < a, (3{a — rt) > 
pushes rates upward. Therefore, (2.3) bears a mean-reversion explanation 
with a being the mean value and (3 the strength of the mean-reversion. Model 
(2.3) is the well-known Ornstein-Uhlenbeck process whose solution, assuming 
Xq is Gaussian distributed with mean a and variance ct^/(2/3), is a stationary 
Gaussian process fully characterized by the mean a and covariance function 

Cov(r„rO = ^e-^l*-^l. 

Model (2.3) assumes constant volatility tr, which hardly matches empirical 
observations. For example, volatilities tend to be clustered and larger observa- 
tions are associated with larger volatilities. A more realistic model would take 
into account the non-constant volatility. Cox, IngersoU and Ross [5,3] derive the 
"CIR" model 

drt = P{a - rt)dt + arl^^dWt- (2.4) 

Assume that 2a(3 > cr^, (2.4) admits a non-negative solution rt that possess a 
noncentral chi-square transition density, while the marginal density is a Gamma 
density. Another model for interest rates used in Courtadon [52] is drt = P{ct — 
rt)dt + crrtdWt- Chan, Karolyi, Longstaff and Sanders [48] further extend the 
"CIR" model to the "CKLS" model, 

drt^ Pia~rt) + CTr]dWt. (2.5) 

For other specifications of interest rate models, see [45; 51; 62; 133]. Ai't-Sahalia 
[2] finds that the 7-day Euro-dollar deposit rate has a strong nonlinear mean- 
reversion only when the rate is beyond the range 4%-17%. To address this issue. 
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he proposes the nonlinear drift model 



dn = (ao + am + a2rl + a3/rt)dt + Po + Pm + (32rJdWt, (2.6) 

which includes all the aforementioned models as special cases. 
A nonparamctric one-factor diffusion model has the form 

dXt = ^i{Xt)dt + n{Xt)dWt. (2.7) 

Here {Xt} might be stock prices, interest rates, the S&P 500 index, or other 
financial quantities; ^{■) and (t(-) are termed drift or instantaneous return func- 
tion, and diffusion, volatility, or instantaneous return variance function, respec- 
tively. In particular, a parametric counterpart of (2.7) is 

dXt = ^ie{Xt)dt + ag{Xt)dWu (2.8) 

where {fj,g,ag) is a known parametric specification with unknown parameter 
9 gR'^. By proper specifications of {fie, ere), we can recover all the parametric 
models discussed above. Conditions for existence of weak solutions and strong 
solutions have been derived in Karlin and Taylor [117]. If the price process is 
assumed to follow the diffusion model (2.7), then among the key quantities of 
interest are the unknown drift /i(-), diffusion (t(-), and the probabilistic prop- 
erties of Xt- For option pricing, it is very important that the price model of 
the underlying security be correctly specified parametrically (model validation) 
or consistently estimated nonparametrically (nonparametric estimation). Such 
model estimation and validation questions are addressed in Sections 5, 6 and 7. 

2.2. Jump diffusion processes 

There was little doubt that the stock returns based on logarithm are independent 
Gaussian random variables until the year of 1963 when Mandelbrot published 
his classical paper [132]. Mandelbrot [132] studied the cotton price changes 
and found that: (i) The histograms of price changes are too peaked relative to 
Gaussian distributions; and (ii) the tails of the distributions of the cotton price 
changes are so extraordinarily long that it may be reasonable to assume that 
the second moment is infinite. Mandelbrot [132] further argued that a good 
alternative model for cotton price changes is the stable distribution with index 
1.7, pioneering the approach of modeling financial data using Levy processes. 
See also the discussion in Fama [73]. 

A Levy process is a stochastic process with right-continuous sample path and 
independent and stationary increments. Special examples of Levy processes in- 
clude Brownian motions, Poisson processes, and stable processes among others. 
The latter two processes are jump type processes. Here we focus on diffusion 
processes with Poisson jumps. Such jump diffusion processes have been proposed 
to capture the heavy-tailedness feature of returns. Other types of Levy processes 
based models have also been proposed. See, for example, Eberlein and Keller 
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[66] and Eberlcin et al. [fiT] for hyperbolic Levy motions and their apphcations 
in fitting German stoek returns; Ait-Sahalia and Jacod [7; 8], Eberlein et al. 
[65], Eberlein and Raible [()><], Nolan [f40], Woerner [159] for estimations under 
Levy process settings. Zhao and Wu [174] study nonparametric inferences for 
nonstationary process driven by a-stable Levy processes; see also the references 
thereof. 

The first attempt to incorporate jumps into diffusion model was made by 
Merton [136]. The basic idea is that we may assume there arc two types of 
randomness driving the stock prices: the first is a Brownian motion generating 
continuous sample path and small movements, while the second one is large but 
infrequent jumps representing sudden shocks/news. In particular, [1 .H(i] assumes 
that stock prices follow the following jump diffusion model 

dSt/St- ^{a- \ti)dt + adWt + MNt, (2.9) 

where Nt is a counting Poisson process for jumps with intensity A, Jt is an 
independent jump size if a jump occurs at i, the jumps are assumed to be 
iid, and n = E(Jt). The inclusion of the coefficient Ak in the drift makes St 
unpredictable. 

By the Doleans-Dade formula, (2.9) admits the following solution 
St /So = exp[(a - - A«;)t + aWt] [J Jt,: 

i=l 

with the convention Y['i=i ~ 1' where Ti,! < i < Nt, denote the times at which 
jumps occur. A typical choice of Jt is lognormal random variable so that St/ So 
has lognormal distribution. Merton [136] derives an option pricing formula for 
call options written on a security whose price process follows (2.9). 

Over the past three decades, (2.9) has been extended in various directions by 
specifying different structures for the drift, diffusion, and jump components. For 
example, assuming that the magnitudes of the jumps are dependent, Oldfield 
et al. [141] propose an autoregressive jump diffusion model. Ball and Torous 
[22] replace the Poisson jump process by a Bernoulli jump process and argue 
that the latter process could yield more satisfactory empirical and theoretical 
analysis, including computational advantages and the attainment of the Cramer- 
Rao lower bound for maximum likelihood estimation. Ramezani and Zeng [143] 
and Kou [122] use an asymmetric double exponential distribution for log( Jt), 
and show that the resulting model can capture asymmetric leptokurtic features 
and "volatility smile" features frequently observed in financial data. In Bates 
[32], the volatihty a in (2.9) is assumed to be a stochastic process of a mean- 
reverting type, 

da^ = (6* - Pc7^)dt + s^dWt, (2.10) 

for another Brownian motion {W^} which could be correlated with {Wt}. Jo- 
rion [115] performs a significance test of jump components and concludes that 
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many exchange rates display significant jump components; see also Lee and 
Mykland [124]. Duffie et al. [(34] study option pricing for multi-dimensional 
afline jump diffusion models in which the drift vector, volatility matrix and 
jump components are assumed to be affine functions of the state variable St- 
For other contributions on jump diffusion processes and their applications, see 
[14; 21; 31; 33; 110; 111; 129] and references therein. 

2.3. Continuous-time stochastic volatility models 

Stochastic volatility (SV) model has emerged as a powerful alternative to the 
traditional deterministic volatility model. In contrast to deterministic volatility 
models that assume the volatility is a deterministic function of the stock prices, 
SV models assume that the volatility is also a stochastic process. The paper by 
Hull and White [lOG] is among the first to study SV models. They consider the 
model 

dSt = ^lStdt + (TtStdWt and da^ = fSafdt + i^a^dW^, (2.11) 

where {Wt} and {M^/} are two standard Brownian motions whose increments 
have correlation p. The volatility {<jf} is a geometric Brownian motion. Hull 
and White [KXi] find that the pricing formula for a European call option under 
the SV model (2.11) behaves differently from the classical Black-Scholes (B-S) 
formula. Under B-S formula, at-thc-money options tend to be overpriced while 
deep-in-thc-money and deep-out-of-thc-moncy options tend to be underpriced. 

Since Hull and White [lOG], various SV models have been proposed. For 
example, Scott [14G] introduces a mean-reverting Ornstein-Uhlenbeck process 
for the volatility dat = (3{a — at)dt + vdWl, Melino and TurnbuU [134] assume 
the CKLS type model [cf. (2.5)] dSt ^ f3{a - St)dt + atS^dWt for the Canada 
DoUar/U.S. Dollar spot exchange rates St with the volatility process log(CTt) 
being an Ornstein-Uhlenbeck process; see also Wiggins [158], and Andersen and 
Lund [IG]. 

In high-frequency setting, instantaneous returns are usually negligible relative 
to volatilities and hence can be taken to be zero. Under this setting, a general 
nonparametric continuous-time diffusion model with stochastic volatility is 

dlog(5t) - atdWt, d\og{al) = r{log{a^))dt + s (logia^ )) dW^ , (2.12) 

where {Wt} and {W/} are two standard Brownian motions with correlation 
p = CoiT{dWt,dWl). When p < 0, this model is often used to model leverage 
effect. When bad news releases, equity price St drops and dWt < 0. The negative 
correlation then implies that dWl > and hence an increase in volatility a^. 
For example, Yu [lti5] proposes dlog(cr^) = a + (3log{af)dt + avdWl, and Omori 
et al. [142] study its discrete version. 

The aforementioned SV models are built out of Brownian motions and a 
natural extension is to replace Brownian motions by more general processes. 
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BarndorfF-Niclscn and Shcphard [- ] introduce another class of SV models based 
on Levy processes by assuming that the logarithm of an asset price follows 

dlog{St) = (m + P(T^)dt + atdWt and daf = -Xafdt + dZxt, (2.13) 

where A > and {Zt} is a general Levy process with stationary and independent 
increments. The Levy process {Zt} other than Brownian motions has jumps and 
therefore the volatility {tr^} may exhibit big jumps. On the other hand, due to 
the continuity of {Wt}, the asset price {St} is still continuous. Barndorff-Nielsen 
and Shephard [28] show that many specific SV models can be built out of (2.13) 
by specifying either the marginal process {at} or the Levy process {Zt}. 

In (2.12), let Xi = log(5iA) — log(5(i_i)A) be the aggregated log returns 
during time period [{i — l)A,zA]. The unobserved stochastic volatihty process 
{(T( }t>o is a stationary Markov process. However, the returns {Xi} do not form a 
Markov chain as in the deterministic volatility model (2.7). Instead, when {Wt} 
and {Wj'} are independent, they form a hidden Markov model; see Section 4.4. 
Finally, we point out that it is a common practice to use a discrete version of 
(2.12) to facilitate computational and theoretical derivations; see Section 3.2. 

3. Discrete-time models 

So far we have discussed continuous- time models. Another powerful tool in 
studying dynamics of variables in financial markets is time series analysis. In 
practice, all continuous-time models are observed at discrete times, therefore we 
may model these discretely observed measurements using time series models. In 
fact, despite the fact that (2.7) is written in a continuous-time form, one often 
uses the following Eulcr discretization scheme 

Xt+A -Xt= iJiXt)A + aiXt){Wt+A - Wt), i = 0, A, 2A, . . . , 

as an approximation to facilitate computational and theoretical derivation. The 
accuracy of such Euler discretization is studied in Jacod and Protter [107]. We 
devote this section to reviewing discrete time series models. 

3.1. Nonlinear autoregressive and stochastic regression models 

A discrete version of the continuous-time model (2.7) is the nonparametric au- 
toregressive conditional heteroscedastic (NARCH) model 

X^ = fx{X,-i) + aiX^-l)e^, (3.1) 

where ei,i S Z, are iid random variables. If cr(-) is a constant function, then 
(3.1) is called a nonparametric autoregressive (NAR) model. Special cases of 
(3.1) include linear AR model: Xi = aXi-i + Si, threshold AR [Tong [154]] 
model: Xi = amax{Xi,0) + bmm{Xi, 0) 4- e^, and exponential AR [Haggan and 
Ozaki [93]] model: Xi = [a -I- 6exp(— cXi_i)]Xj_i + Si among others. The lat- 
ter three models have constant conditional variances. In an attempt to model 
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United Kingdom inflation during the time period 1958-1977, Englc [(!')] proposes 
the class of autoregressive conditional heteroscedastic (ARCH) models. The es- 
sential idea of ARCH models is that the conditional variances are non-constant 
but rather change as time evolves. In particular, ARCH model of order one has 
the form 



X, = ^al + alXf_^e,, ao > 0, < ai < 1. (3.2) 

Model (3.1) can generate heavy-tailed distributions. To see this, consider 
the simple model Xi = a{Xi-i)ei, and assume that eo has standard normal 
distribution. Then, by Jensen's inequality, 

V t . mt) E[aHX.-i)ej] _^ E[a4(X._i)] 

~ [E(X2)]2 - [E(a2(X,_i)£f)]2 ~^[E(a2(X,.i))^] 

For example, consider the ARCH(l) model in (3.2). It is easy to show that 
E(X4)/[E(X2)]2 = 3(l-a?)/(l-3a?) > 3 if 3a? < 1 and E(X4)/[E(X2)]2 ^ ^ 
if 3a? > 1. The heavy-tailedness feature implied by model (3.1) makes it a suc- 
cessful candidate in many financial applications where it is frequently observed 
that returns exhibit heavy tails; see [155]. 

A more general version of (3.1) is the following stochastic regression model 
of order one 

X, = M(KO+a(r,)£,. (3.3) 

Here Yi and Xi are the covariate variable and response, respectively, and the 
error Si is independent of Yj , j < i. In the special case of Yi = Xi-i, (3.3) reduces 
to (3.1). Depending on the context, we may model (li)igN as a sequence of either 
iid random variables or time series. For example, if Yi is the measurement for 
z-th subject, then we may assume that (yi)igN are iid. On the other hand, if Yi 
is the measurement for a subject at time i, then it is natural to assume that 
(li)igN form a time series. In the latter case, a possible model for Yi might be 
the NARCH model 

= ^(r,_i)+a(K,_i)77„ (3.4) 

where rji^i e Z are iid random variables. To ensure that is independent of 
^■jj ^ i, we assume that Si is independent of rjj^j < i. See Zhao and Wu 
[172; 173] and references therein. 

Due to the so called curse of dimensionalit'i/\ it is practically infeasible to 
extend the nonparametric model (3.1) to orders beyond two. Other extensions 
of model (3.1) include ARCH model with order p, generalized ARCH (GARCH) 
model in Bollerslev[40] and exponential GARCH (EG ARCH) model in Nelson 
[139] to new a few. For extensive expositions of applications of ARCH, GARCH 
models and their variants in financial econometrics, see the survey papers [34; 
41; 42; 57; 148] and the books by Gourieroux [92] and Tsay [155] (Chapter 3). 
Semi-parametric approach is studied in Gao [87]. 
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3.2. Discrete-time stochastic volatility models 

An Euler discretization of (2.12) is the following discrete-time stochastic volatil- 
ity model 



where {ei}i^i and {?7i}igz are two iid sequences. To guarantee positivity of the 
volatility a^, it is common to model the logarithm of volatility instead. For 
example, Taylor [l"i2] proposes an AR(1) model for \og{af): 



Note that a constant term in the right hand side of (3.6) is unnecessary since such 
a term can always be absorbed into rji in (3.5). In (3.6), the innovations ei,i &Z, 
are assumed to be iid normals with mean zero and variance cr^ and independent 
of rji in (3.5). The variance cr| of Si measures the uncertainty of future volatility. 
In the special of cr^ = and A = 1, the volatility is a deterministic constant. 
Thanks to the linear autorcgrcssivc relationship, (3.6) is often called ARSV(l) 
model or lognormal stochastic autorcgrcssivc volatility (SARV) model. Due to 
its simple structure and mathematical tractability, (3.6) has been extensively 
studied in the hterature; see Broto and Ruiz [4G], Shephard [148], Taylor [153], 
and references thereof. Ball and Torous [22] incorporate the discrete version of 
Chan et al. [48] and propose the following SV model for interest rates: 

n ^ {a + I3)rt-i + (7t-ir2_iet and \ogat = pcrt-i + v{l - p) + rjt. (3.7) 

Wiggins [158] also uses a similar model under continuous-time setting. 

In (3.6), a common choice for the density of £i is the standard normal density 
4). Then the conditional distribution of Xi given at is Gaussian. Therefore, 
the marginal density, denoted by fx, of Xi is a mixture of normal densities 
a~^(l}{x/ai) with respect to the density fa of CTj: fx{x) = / a~^(j){x/a)fc{a)da. 
As in the case of ARCH model in Section 3.1, {Xi} from (3.6) exhibit heavier 
tails than that of the normal errors {si}. Other heavy-tailed distributions of the 
errors {si} are studied in, for example, Harvey et al. [100], Bardnorff- Nielsen 
[27], Gallant et al. [85], and Liesenfeld and Jung [128]. 

Recently, model (3.6) with {r]i,ei+i),i G Z, forming iid copies of bivariate 
normal vector {r],e), Cov(ry,e) < 0, have been introduced to model leverage 
effect; see Yu [165] and Omori et al. [142]. For applications of SV models and 
their estimations see the survey papers by Broto and Ruiz [40], Ghysels et al. 
[91], and Shephard [148]. 

4. Dependence structure of discrete samples 

Let {Xt}ti£T be a generic process with time index T. For example, {Xt}ti£T 
might be the continuous-time models in Section 2 with T = [0,oo) or the 
discrete-time models in Section 3 with T = {0, 1, 2, . . . .}. Regardless of whether 



Xi = a^rji and = r(crf_ J + s(CTjL Je. 



(3.5) 



Xi = airii and log(CT,^) = X\og{af_i) + £,;. 



(3.6) 
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T is continuous-time or discrete-time, in practice, the process {Xt}ti£T is only 
observed at discrete time points which could be possibly random. Moreover, 
the observations could be contaminated with errors. One of the main objectives 
in financial modeling is to study the dependence structure behind the data- 
generating mechanism based on discrete observations. In this section we review 
some main dependence features for discrete observations from financial models. 

4.I. Markov chains 

Markov chains are widely used in virtually every scientific subjects, including 
biology, engineering, queueing theory, physics among others. In financial econo- 
metrics, due to the property of independent and stationary increments of Brow- 
nian motions, it would be reasonable to expect that the Markovian property is 
in the cards. In fact, we deserve even more: under some growth conditions on the 
drift /i and diffusion cr, {Xt}t>o defined by the stochastic differentiation equa- 
tion (2.7) is strong Markovian [for a definition, see pp. 149-152 in Karlin and 
Taylor [117]]. Therefore, discrete observations {Xi = XiA}i>o form a Markov 
chain. Here A > is a small but fixed number representing sampling frequency. 
The process {Xi}i>o in (3.1) is also Markovian. 

The Markovian property plays an important role in statistical estimation 
and inference. Let {Xi}i^z be a stationary Markov chain. Denote by Tr{x; 9) 
and p{x\x']6) the marginal density function of Xq and the transition density 
function of Xi^i at x given Xi ^ x' , respectively. Here 6 is parameter. Given 
observations Xi,0 < i < n, the log likelihood function is given by 

n 

e{Xo,...,Xn;9) - ^log[p(X,|X,_i;0)]+log[^(Xo;0)] 

n 

« ^log[p(X,|X,_i;0)]. (4.1) 
j=i 

The latter is often termed conditional likelihood by ignoring the marginal den- 
sity. Additionally, by the Markovian property, it suffices to study transition 
density of lag one. In fact, let pk{x\x') be the fc-step transition density. Then 
Pk{x\x') = / pi{x"\x')pk-i{x\x")dx" . Therefore, pk can be obtained recursively 
from the one-step transition density pi . 

4-2. Random samples and high-frequency financial data 

In contrast to low-frequency financial data that are sampled regularly on a daily, 
weekly or monthly basis, high-frequency financial data are usually sampled at 
irregular random times. Let = tq < ti < • • • < tnt < T he Nt + 1 discrete 
observations up to time T. Let = t; — r^-i be the sampling intervals. In 
low-frequency setting, A^ = A is assumed to be constant. In high-frequency 
framework. A;, i = 1, 2, . . . , are assumed to be random variables. In fact, most 
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real financial market transactions arrive irregularly and randomly. Among many 
examples are credit card purchases and stock buy/sale transactions. 

To study the transaction times, Engle and Russell [70] propose a class of 
autoregressive conditional duration model given by 

A,^ij,e„ V« =IE(A,|A,_i,A,„2,...,Ai), (4.2) 

where £i,i = 1,2,..., are iid random variables independent of "0^. Depending on 
different specifications, (4.2) includes many examples, including, for instances, 
m-memory model ^/^i = 7 + 'Yl^=o'^j^i-j ^^"^ ACD(m, g) model V'j ~ 7 + 
SJLo '^j Ai_j+X]j=o l^j'^i-j- When follows standard exponential distribution, 
the latter model is termed exponential ACD (EACD) model. Sec [70] for more 
details. Zhang et al. [169] further extend the linear ACD model to threshold ACD 
(TACD) model by allowing the coefficients in the ACD model to vary according 
to the behavior of a threshold variable. Zhang et al. [109] use generalized Gamma 
distribution for e^, resulting in GACD model, and find strong evidence that 
stock dynamics behavior differently during fast transaction periods and slow 
transaction periods. 

In addition to the randomness introduced by the underlying process, the 
randomness from the random sampling also plays an important role in statistical 
inferences. In the presence of random sampling, one needs to consider likelihoods 
for bivariate observations (X^, A^). Ai't-Sahalia and Mykland [9] argue that the 
loss from not using sampling intervals is even greater than the loss due to the 
discreteness of samples. Duffie and Glynn [()3] study random samples from a 
general Markov process. 

4-3. Error-in samples 

In practice, we may not observe Xt directly but a contaminated version 
of it. For example, assume that Xi is the actual stock returns based on loga- 
rithm during the i-th time period, and X* is the observed returns with errors. 
This phenomenon is closely related to the market microstructure and becomes 
more pronounced under high-frequency setting. For example, for a continuous 
semimartingale, it is well-known that the realized volatility computed using 
discrete observations converges in probability to the quadratic variation of the 
semimartingale as the sampling frequency increases. This, however, contradicts 
with empirical observations that realized volatility using high-frequency data 
generally does not stabilize; see Brown [47]. One possible explanation for this 
phenomenon is that the underlying process is contaminated with market mi- 
crostructure errors. 

There are two popular market microstructure error models in the literature: 
additive errors and rounding errors. The additive errors model assumes that 
X* = Xi+£^i with error ^i. The errors {^j} are assumed to be iid and independent 
of {^i}- ^^^1 fo'^ example, Ait-Sahalia et aL[10], Hansen and Lunde [9(1], Zhang 
[1G7], Zhang et al. [168], and Zhou [175] to name a few. For rounding errors 
model, X* is taken to be the nearest multiple of a smallest unit a (say, 1 cent 
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in stock prices). That is, X* — a[Xi/a], where [•] denotes the integer rounding 
operation. See Delattre and Jacod [59] and Zeng [166]. Li and Mykland [126] 
study a more general error model via Markov kernel: 

P(X* < x\{Xt}t>o) = nK < ^\^u) = Q{Xu,x)- (4.3) 

That is, given {Xt}t>o, the contaminated version X* only depends on Xt-. 

Observations with contaminated errors make statistical inferences more diffi- 
cult in a few aspects. Let us consider the additive errors Y* = + First, the 
contaminated process {X*}i>o may not form a Markov chain even if the origi- 
nal series do. Instead, it becomes a hidden Markov chain with the hidden chain 
{Xi}i>o'i see Section 4.4. For likelihood based methods, we need to integrate 
out the unobservable process {Xi}i>Q. Second, working with contaminated ob- 
servations is essentially a deconvolution problem: extracting information about 
Xi based on X*. The latter problem is usually quite difficult. For example, 
nonparametric kernel density estimators for density function of Xi have very 
slow rate of convergence; see Stefanski and Carroll [1")1], Liu and Taylor [l-']0], 
and Fan [74]. Third, volatility computed from {X*}i>o has two components: 
volatility from the true process {Xi}i>o and the errors {^i}i>o- The latter term 
represents the bias and needs to be taken care of; see Ai't-Sahalia et al. [10], 
Zhang [167], and Zhang et al. [168]. 

4.4- Hidden Markov models 

The Markov chain assumption works well for deterministic volatility models, 
that is, the volatility cr is a deterministic function of the state variable Xf. Ex- 
amples include the continuous-time model (2.7) and the nonlinear autoregressive 
model (3.1). In many applications, however, the Markov chain assumption is too 
restrictive. For example, in (2.12), let Xi = log(S'iA) — log(5'(i_i)A) be the ag- 
gregated log returns during time period [{i — 1)A, iA]. Because the volatility 
itself is an unobserved stochastic process with serial dependence, {Xi}i>o does 
not form a Markov chain. Similarly, {Xi}i>Q from the discrete-time stochastic 
volatility model (3.5) is not Markovian. For stochastic volatility models, hidden 
Markov models (HMM) offer a good alternative; see Genon et al. [90] and Zhao 
[170]. Following Bickel and Ritov [.'^7], Leroux [125] and Zhao [170], we give a 
definition of HMM. 

Definition 1. A stochastic process {Xi^i^j^ with state space (M,S(R)) is a 
hidden Markov model with respect to the hidden chain {Yi}i^z with .state .space 

{y,Biy)) if 

(i) {Yi}i£z is a strictly stationary Markov chain. 

(ii) For all i, given {Yj}j<:i, {Xj}j<i are conditionally independent, and the 

conditional distribution of Xi depends only on Yi . 
(Hi) The conditional distribution of Xi given Yi ^ y does not depend on i. 

If {Xi}i^z itself is a stationary Markov chain, then it is also a HMM with re- 
spect to the observable Markov chain {Yi = Xi-i}i^z- Therefore, HMM includes 
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Markov chain as a special case. Zhao [ I 70] has shown that, many continuous-time 
and discrete-time models used in financial econometrics are special examples of 
HMM. For example, 

• In (2.7), {Xi} is a HMM with respect to the observable chain {K; = Xi-i}. 

• In (2.12), the aggregated log returns {Xi = log(S'iA) — log(S'(i-i)A)} is a 
HMM with respect to the unobservable chain {Y^ ^^ ~ io't+(i~i)A)te[o,A]} 
or {Y<'^-{al^,^^,j;tr)A-?dt)}; 

• In (3.5), {Xi} is a HMM with respect to the unobservable Markov chain 
{Y, = aj. 

• In (3.3) and (3.4), {Xi} is a HMM with respect to the observable chain 

HMM can also be used to describe observations with contaminated errors in 
Section 4.3. For example, consider the additive error model X* = Xi+£^i, where 
the errors {^,;}i>o are assumed to be iid and independent of {X,;}i>o. Clearly, if 
{Xi}i>o is a Markov chain, then {X*}i>Q is a HMM with respect to the unob- 
servable chain {Xi}i>Q. Examples satisfying this condition include models (2.7), 
(3.1), and the hyperbolic Levy motion model in [(>7]. On the other hand, the 
HMM structure may still hold even if {Xi}i>o does not form a Markov chain. 
For example, consider the stochastic volatility model (3.5), then {Xi}i>Q is not 
a Markov chain, but {X*}i>o is still a HMM with respect to the unobservable 
chain {Yi = cri}i>o provided that rji and Ei in (3.5) and the errors ^i are in- 
dependent. A similar statement holds true for the continuous-time stochastic 
volatility model in Section 2.3. Gcnon et al. [: ] and Zhao [170] also show that 
certain dependence structure (for example, mixing properties) of the hidden 
chain {Yi} carries over to {(X;,!^)}, and hence many tools for Markov chain 
are also applicable to HMM. 

5. Model estimation: parametric methods 

In this section we review some popular parametric estimation methods in fi- 
nancial econometrics. When we have a sufficient amount of prior information 
about the underlying model, for example, the model is from a parametric family 
{M.e, G 6}, where Me is a known parametric form with unknown parameter 
0, then the main focus becomes the estimation of the parameter 9. Parametric 
methods address estimation problems in such contexts. 

5.1. Likelihood based method 

Given observations {Xi}o<i<n, if we know the parametric form of the model that 
generates {Xi}o<i<n, then maximum likelihood is the natural method. Suppose 
that {Xi}i>o form a stationary Markov chain with invariant density ■k{x] 9) and 
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transition density p(x\x';0). Then the log likelihood function is 



n 



£{Xo, . . . , Xn, 0) 



^logb(X,|X,_i;0)] +log[^(Xo;0)] 



n 



^logb(X,|X,_i;(?)]. 



(5.1) 



i=i 



The maximum likelihood estimate (MLE) is = argmaxg £(Xo, . . . , X„; 9). Con- 
sider, for example, model (2.1). Let X; = log(5.iA) — log(S'(,;_i)A) be the aggre- 
gated returns during [(i — l)A,iA]. Then {Xi}i<i<„ are iid normal random 
variables with mean (/i — f7^/2)A and variance cr^A. Therefore, we have ex- 
plicit form in (5.1). For Vasicek model (2.3), {Xi ~ riA}i>o have Gaussian 
transition density with mean (1 — p)a + pXi-i and variance cr^(l — p^)/(2/3), 
where p = exp(— /3A). For the CIR model (2.4), the transition density is a 
noncentral chi-square distribution with parameters fully determined by a, /3 
and a. It is also easy to write down the transition density for many paramet- 
ric time series models. For example, in (3.1), let Si be iid standard normals. 
Denote by (/)(.t) the standard normal density. Then p{Xi\Xi-i]9) = (?!){[Xi — 
pe[Xi-i)\/ ag{Xi-i)\ / as{Xi-i). For many continuous-time models, however, 
one practical issue arises. Except for models (2.1), (2.3) and (2.4), transition 
densities for many other parametric models do not admit closed forms. One 
way out is to use the following Euler approximation scheme for (2.7): 



where {eiA}i>o are iid standard normals. In fact, most continuous-time models 
used in finance are estimated based on the approximation (5.2). The approxi- 
mation works well in high-frequency setting. An alternative approach is the ap- 
proximation method in Ai't-Sahalia [.'J; 4] where the likelihood is approximated 
by a sequence of likelihoods based on Hermite polynomials. See the survey paper 
by Ai't-Sahalia [5] on likelihood methods for (2.8) and its multivariate version. 

For hidden Markov models in Section 4.4, since the hidden chain {Fi}o<i<n is 
not observable, we need to integrate out {li}o<i<ri in order to obtain the likeli- 
hood function for the observations {Xi}o<i<n- To be precise, denote hy f{x\y;9) 
the conditional density of Xi at x given li = y, by Q[-\y] 9) the transition prob- 
ability measure of Yi given = y, and by Q{-;9) the invariant probability 
measure of Yq. Then the likelihood function for observations {Xi}o<^i<n is 



L(Xo,...,X„;0) = l[f{X,\y,;9)Q{dyo;9)l[Q{dy,\y^.,;9). (5.3) 



Xt+A ^Xt + p{Xf)A + a{Xt)A'^^et, t = 0, A, 2A, . . . , 



(5.2) 



n 




i=l 



Due to the high dimensional integral, direct computation and maximization of 
L{Xq, . . . , Xn', 9) is computationally infeasible. This makes estimation of stochas- 
tic volatility models quite difficult; see Section 5.4. 
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5.2. Generalized method of moments 

The generalized method of moments [GMM, Hansen [95]] is a popular parameter 
estimation method in finance. Assume that we have a stationary process {Xt}t>o 
whose data-generating mechanism involves parameter 0. The essential idea of 
GMM works as follows: 

(a) Derive a set of theoretical moments conditions. That is, for properly chosen 
function gg, find constant Cg^g such that 

E[gg{X„)]^Cg,g, or E[gg{Xn)] ^ with gg ^ gg - Cg.g. (5.4) 

(b) Minimize certain measure of the discrepancies between the empirical and 
the theoretical moments. Namely, for a chosen criterion norm 5, 

n n 

9 = argmin(5 - y^5e(Xi) = argminiJ ^ ge{Xi) , (5.5) 
g Lnf^ J , J 

For example, if S{u) = m^, then we have least-squares type estimate. In (5.5), 
one often uses weighted discrepancies for a set of functions gg. For large sample 
properties of GMM, sec Hansen [95]. 

The key step in GMM is step (a). We now introduce the idea in Hansen 
and Scheinkman [97] to derive moments conditions for a stationary Markov 
process {Xt}t>o- Let {Jt}t>o be a family of operators defined by Jtg{x) = 
¥.[g{Xt)\Xo = x\. Notice that Jog{x) = g{x). The operators {Jt}t>o uniquely 
determine the transition density of Xt given Xq by taking g{x) = exp(mx), u G 
K. Introduce the infinitesimal generator [see Karlin and Taylor [117] and Hansen 
and Scheinkman [97]] oi Xt, £, given by 

^ ^Jl^ = li,, Jtg{x)-g{x)_ 
^ ' dt t=o tio t ^ ^ 

By stationarity, E[j7f.g(Xo)] = E[g(Xo)]. Therefore, assuming that we can ex- 
change the order of expectation and differentiation, (6.6) implies that 

E[Cg{Xo)] = 0. (5.7) 

The expression (5.7) holds for all functions g satisfying some regularity condi- 
tions. Thus, we can, in principle, produce infinitely many moments conditions. 

By Ito's formula, the infinitesimal generator of {Xt}t>o from the stochastic 
differentiation equation (2.7) is given by 

r ( N dg{x) d^g{x)a^{x) 

So, for a given paramctrization {fi, a) — {^g, ag), we can use (5.8) and GMM to 
estimate 9; see Hansen and Scheinkman [97]. Duffie and Glynn [ihi] apply GMM 
to estimate parameters based on random samples from a Markov process. 
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Condition (5.7) only uses information from marginal stationarity. A more 
efficient approach would incorporate transition or conditional information into 
moments conditions. Since most transition densities except a few rare cases do 
not have explicit form, any approach relying on transition information needs 
some approximation technique. For example, for the CKLS model (2.5), we 
can apply the Euler approximation scheme to obtain approximated moments 
conditions: 



where et+A = n+A - rt - (3{a - rt)A. 

In (5.4), the assumption that Cg^g needs to be of a known form limits sig- 
nificantly the applicability of the GMM estimator. To overcome this difficulty, 
Duffie and Singleton [(il] introduce a simulated moments estimation method 
which estimates the parameters of interest by matching the sample moments of 
the actual and simulated process. 

5.3. Other parameter estimation methods for diffusion models 

We briefly mention other parameter estimation methods for model (2.8). Den- 
sity based parameter estimation is to minimize discrepancy between the non- 
parametric density estimate and the theoretical parametric density or its para- 
metric estimate. Consider, for example, model (2.7). Under parametric set- 
ting (/i, cr) = {^g,ae), the theoretical stationary density of {Xt} is given by 
fe = /aio.cto in (6.1). Let / be the nonparametric kernel density estimate in 
(6.2). Then 6 can be estimated by 



Ait-Sahalia [2] establishes -yn-consistency for 6. 

Other contributions include martingale estimation function method in Bibby 
and S0rensen [.'>'3] and Kessler and S0rensen [llS] among others. 

5.4. Parameter estimations in stochastic volatility models 

As argued in Section 5.1, it is computationally infeasible to estimate parameters 
in stochastic volatility models using direct maximum likelihood methods. Here 
we briefly review some alternatives. One popular approach is various moment 
based methods in, for example, Andersen and Lund [IG], Andersen and S0rensen 
[17], Gallant and Tauchen [8G], Melino and Turnbull [134], Taylor [152], and 
Wiggins [158]. The basic idea is to express the parameters of interest in terms of 
population (conditional) moments and replace the latter by sample (conditional) 
moments, or employ the generalized methods of moments (GMM) in Section 
5.2. Andersen and S0rensen [17] study the finite sample performance of GMM 



E{rt) = a, Eiet+A\rt) ~ 0, E[e^t+^\n] ^ a' 




n 




(5.9) 
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estimation of the stochastic volatihty model (3.6). Andersen et al. [l^>] examine 
small-sample properties of the efficient method of moments proposed by Bansal 
et al. [20] and Gallant and Tauchen [<SG] . Other methods include quasi-maximum 
likelihood in Ruiz [144] and Harvey et al. [100], Bayesian Markov-chain Monte 
Carlo method in Chib et al. [50], Jacquier et al. [108; 109] and Kim et al. [119], 
and the method in Genon et al. [89]. See the survey paper by Broto and Ruiz 



6. Model estimation: nonparametric methods 
6.1. Nonparametric density estimates 

One important goal of financial econometrics is to study the distribution of 
returns from financial markets. Such distribution can provide rich information 
about the underlying process driving the financial markets. For example, Man- 
delbrot [132] finds that cotton price changes have heavy tails relative to normal 
distributions. This motivated him to use stable distribution as a possible alter- 
native over the traditional normal distribution. For the past three decades, the 
leptokurtic property and volatility smile observed in financial data have been the 
driven force for searching for more appropriate models than the Black-Scholes 
model to account for empirical characteristics of financial data. 

To appreciate the idea more, consider model (2.7). Let / be the marginal 
density function of the stationary solution Xt on D = {Di,Du) with — cx) < 
Di < Du < -l-oo. Under some regularity conditions, / is given by 



where the choice of the lower bound point € D is irrelevant, and c{xq) is a 
normalizing constant to ensure that / is a probability density on _D; see Ait- 
Sahalia [2]. Therefore, the marginal density / has an intrinsic connection to 
the drift /i and the diffusion a, which can be used to do model validation or 
model parameter estimation. For example, Ai't-Sahalia [2] and Zhao [170] study 
model validation problem Hq : {fi,a) = {iJLg,(j0) for model (2.7) by comparing 
nonparametric density estimate and parametric density estimate under Hq; see 
Section 7.6. Ai't-Sahalia [1] constructs nonparametric estimate of the diffusion 
function a through nonparametric density estimate. 

Let {ATtjigr be a generic stationary process. For example, it could be the 
continuous-time process (2.7), discrete-time process including aggregated re- 
turns in (2.12), nonlinear time series in Section 3.1, among others. Given discrete 
observations {Xi}i<i<„, the classical nonparametric kernel density estimate for 
the density / of Xi is given by 



[46]. 




(6.1) 




(6.2) 
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See Silverman [14/]. Wu and Mielniczuk [l(i2] study the asymptotic behavior of 
/ for linear proeess. Assuming some mixing conditions, it is possible to establish 
asymptotic normality for / with optimal rate 0(n~^/^). 

Recently, Schick and Wefelmeyer [ I 4-^] and Kim and Wu [1211] propose convolution- 
type efficient density estimates that achieve the parametric rate 0{n^^^^). To 
appreciate the idea, consider the model 

X,= fle{X^-l)+£^, (6.3) 

where (ei)iez are iid and E(£o) = 0, and fig is a known parametric form with 
unknown parameter 6. The popular AR, TAR, and EAR models are of form 
(6.3). The ARCH model X, = ^/^^^fT^i^^T^g^ jg ^iso of form (6.3) after trans- 
formation: log{Xf) = log(a^ 4- b^Xf_-^^) + log(£|). Denote by fi_ig(x) and the 
density functions of /le(Xo) and Eqi respectively. By convolution, 

= / fMx){y)fe{x~y)dy. (6.4) 

Then the convolution-type estimate procedure works as follows: 

(a) Obtain a y^-consistent estimate ^ of by least-squares method or M- 
estimation method. 

(b) Compute fJLg{Xi-i) and Si = — /ig(Xi_i), 1 <i <n. 

(c) Obtain nonparametric kernel density estimates /^^(x) and of ffig(x) 
and fe, respectively, by the estimated values fig{Xi-i) and ii via (6.2). 

(d) In (6.4), replace f^g(x) and fe by their estimates, and obtain /. 

Kim and Wu [120] establish a ^/n central limit theorem for the resulting convolution- 
type estimate /. Schick and Wefelmeyer [14-5] obtain a similar result for lin- 
ear process. Zhao [171] studies efficient density estimation for conditional het- 
eroscedastic models. 

Since the distributional property of a stationary Markov process can be char- 
acterized by the marginal and transition density functions, let us now consider 
the transition density of a Markov process {Xt}t£T based on discrete observa- 
tions {Xi}i<ci<n- By the Markovian property, it suffices to consider the transition 
density at time lag one, that is, the conditional density function 7r(x|a:') of Xi at 
X given that = x' . Denote by 7r(a;, x') and f{x) the joint density of {Xq, Xi) 
and the marginal density of Xq, respectively. Since Tr{x\x') = 7r(x, a;')//(a;), a 
nonparametric estimate of 7r(a;|a;') can be constructed by plugging in the non- 
parametric estimates of the latter two densities. 

6.2. Nonparametric function estimation 

In contrast to parametric methods, nonparametric methods on function esti- 
mation do not assume any parametric form of the function other than certain 
smoothness assumption. Suppose, for instance, that we want to estimate the 
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mean regression function E(Xi|yi = y). Let f{y) and f{x,y) be the densities of 
Yi and {Xi,Yi), respectively. Then we have 

EiX,\Y, =y)^-L- [ xfix,y)dx « (6.5) 

Expression (6.5) can be used to construct nonparametric estimates for drift 
and volatiHty functions in financial models. Consider model (3.3), assume that 
E{so) = and E{el) = 1. Then ^(y) = E{X,\X, = y) and a^{y) = E{[X, - 
IJ.(Yi)]'^\Yi = y}. The idea is as follows: 

(a) Apply (6.5) to get a nonparametric estimate fi of jj.. 

(b) Compute residuals ii = Xi — fi{Yi). 

(c) Apply (6.5) to {£f,Yi) to nonparametrically estimate cr^. 

See Fan and Yao [7(i] and Zhao and Wu [ ! 7 4] for related works. The latter papers 
also show that the volatility function can be estimated as well as if we know the 
drift function. The intuition is that the bias term resulting from estimating the 
drift is of order 0{b^^) and is squared to 0(6*) when estimating the volatility. 

To apply the above idea to the estimation of the continuous-times model 
(2.7), we follow Stanton [149] and introduce the infinitesimal generator of Xt, 
C, given by 

= ,^IE[g(X..A,t4-A)|X.^.]-g(.,t) 

^ ^ AlO A 

dg{x,t) dg{x,t) d^g{x,t) a^{x) 
= + ^^'^("^ + — (^-^^ 

in view of Ito's formula. As a special case, if the function / does not depend on 
t, then (6.6) reduces to (5.8). Apply a Taylor's expansion to (6.6), 

E[g{Xt+A.t + A)\Xt]^g{Xut)+Cg{Xut)^ + ]^C^g{Xt,t)^^ + ---. (6.7) 

Thus, a first order approximation of Cg{Xt, t) is 

Cg{Xut) = ^E{[g{Xt+A,t + A) - g(Xt,t)]|Xj + 0(A). (6.8) 

Taking g{x,t) = x, then Cg{Xt,t) = fJ-iXt) and 

liix) = ^E[{Xt+A - Xt)\Xt = x] + 0(A). (6.9) 

Similarly, taking g{x, t) = {x — XtY , we have 

a^x) = ^E[{Xt+A - Xtf\Xt - x] + 0(A). (6.10) 

Therefore, (6.9) and (6.10) can be used to construct nonparametric estimates 
of ^ and a in conjunction with (6.5). 
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For the simplified model dXt ~ (T{Xt)dWt, Arfi [18] uses (6.10) to estimate 
(T^ via kernel smoothing. Other contributions on nonparametric estimation of 
(2.7) include Bandi [23], Bandi and PhiUips [24], Florens-Zmirou [82], Foster 
and Nelson [83], and Jiang and Knight [112]. Higher-order approximations are 
considered in Stanton [149] to reduce biases. Fan and Zhang [78], however, argue 
that the bias reduction is achieved at the cost of exponential increase of the 
variance. Therefore, they suggest that one should avoid using too higher-order 
approximations in practice. 

For the continuous-times model (2.7), an alternative approach is based on 
the Euler approximation scheme (5.2). Then we can estimate ^ and a through 
the expressions: 

E[A-\Xt+A - Xt)\Xt = x] = ^(x); 
E[A-\Xt+A -Xt- ^i{Xt)^f\Xt = x] = a^{x). 

For (5.2) to approximate (2.7) with a reasonable accuracy, A needs to be very 
small. Therefore, (5.2) is often useful in dealing with high-frequency data over 
a long time span: A = A„ and nA„ oo. 

The estimate (6.5) is basically a local constant fit based on weighted least- 
squares. There are two natural variants. The first one is the local linear method. 
Let /x(y) = E(Xi]yi = y). The local linear estimate of {fj,{y), fi' (y)) is 

n 

{fL{y), fi'iy)) = argmin V[X, -a- h{Y, - y)]^Kb„{y ~ (6.11) 

Local linear estimates can reduce boundary effect. For model (3.3), Fan and Yao 
[76] use (6.11) to estimate ^ first, and then apply (6.11) to the squared residuals 
[Xi — fl{Yi)]'^ to estimate a^. 

Another variant of (6.5) is the least-absolute-deviation (LAD) estimate. In 
model (3.3), assume that median(£o) = and medianjeo] = 1- Then /i(y) = 
median{Xi\Yi = y) and a{y) = median[|Xi — /x(Fi)||Fi — y\. Thus, the LAD 
estimates of /i and a are 

n 

fj.{y) = argmin^ \Xi - ^ Ki,^ {y - Y,), (6.12) 

n 

a{y) = SiYgmmY,\\X^- fi{Y)\-a Kbjy-Y). (6.13) 

Basically, LAD estimate is a local median type estimate and hence it is ro- 
bust against outliers. Under a very general dependence structure, Zhao and Wu 
[172] study the asymptotic properties of the LAD estimates for (3.3). The re- 
sults obtained are applicable to a variety of time series models, including linear 
processes and nonlinear models (3.1) and (3.3). 
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6.3. Semi-parametric estimation via nonparametric density 

Semi-parametric models are between parametric and nonparametric ones by im- 
posing parametric form on part of the model while keeping other parts nonpara- 
metric. They are particularly useful when we have prior knowledge about part 
of the data generating process while staying flexible on the remaining parts. For 
example, consider model (2.7). We have two semi-par ametrizations: (i) ^ fJ,e 
for a known parametric form ng with unknown parameter 9; and (ii) a = ag for 
a known parametric form ag with unknown parameter 9. The two frameworks 
have different ranges of applicability. Many empirical studies suggest fitting a 
simple form for fi. For example, [1; 48; 53; 156] use linear form for /z, and [2] 
uses nonlinear drift form (2.6) to fit interest rates. For high-frequency data (say, 
daily, hourly, or 5-minute), it is even reasonable to assume /x to be constant or 
zero since we are more interested in the volatility instead. Volatility is very im- 
portant in options pricing. Options written on volatile assets are more expensive. 
In such circumstances, it is desirable to treat the volatility nonparametrically to 
avoid mis-specification. On the other hand, in some cases it may be reasonable 
to assume a parametric form for the volatility function while keeping the drift 
nonparametric; see Kristensen [123] and Banon [25]. For an extensive exposition 
of semi-parametric methods, see Gao [87]. 

Here, we briefly review the estimation methods in [1] and [25] for the two 
semi-parametrizations of model (2.7). Let p{/^,x\x') be the transition density 
of Xt_|_A at X given Xt = x' , and t:{x) the stationary density of Xt. Then the 
Kolmogorov forward equation associated with (2.7) is 

dp{^,x\x') _ ^[^i{x)p{A,x\x')] d^[a\x)p{A,x\x')] 

dA dx 2dx^ ■ ^ ' ' 

By stationarity / p{A, x\x')7r{x')dx' = tt{x). Multiply (6.14) by 'k{x') and take 
integral with respect to x' , 

rf^[g^(x)7r(x)] _ ^ d[n{x)Tr{x)] 

In Ait-Sahalia [1], he assumes /i = /ig. Integrate (6.15) twice to obtain 

2 

(j'^{x) = — -— / fig{u)'K{u)du. (6.16) 
"■(^^j Jo 

Let be a consistent estimate of 9, and tt{-) a nonparametric kernel density 
estimate constructed as in (6.2). Then we can plug 9 and tt into (6.16) to obtain 
a nonparametric estimate of (j'^{x). For ^g{x) = (3(a — x),9 = {a, (3), 9 can be 
estimated through the regression equation V.{Xt+i^,\Xt) = a + e~^^{Xt — a). 
This approach has an apparent advantage: it always works regardless of the size 
of A while Stanton's [149] method requires high-frequency data A — > 0. See [1] 
for more details. Banon [25] integrates (6.15) once to obtain 



(6.17) 



Z. Zhao/Nonparametric methods in financial econometrics 



23 



Banon [25] considers a constant but unknown a. The drift ^ is nonparametrically 
estimated using (6.17) with estimated a and density 7r(-). 

6.4- Nonparametric integrated volatility estimation 

Most finance theory hes within scmimartingale framework. A stochastic process 
{Xt}tyQ (assuming = 0) is said to be a continuous semimartingale with 
respect to a fihration process {Tt}t>o if 

Xt = Mt + At, i>0, (6.18) 

where {At,!Ft\t>o is an adapted process with bounded variation paths on any 
finite subinterval of [0, oo), and {Mt,J^t\t>Q is a continuous local martingale; 
see p. 149 in Karatzas and Shreve [116]. Denote by {X)t the quadratic varia- 
tion process of {Xt}t>o — the unique adapted and increasing process such that 
{X)q = and {X^ — {X)t} is a martingale (cf. Doob-Meyer decomposition). For 
continuous semimartingale, {X)t — Xf — 2 J^XsdXg- The fundamental result 
states that 

n-l 

lim V(Xt,_^, = (X)t, in probability, (6.19) 

for all partitions = to < < • • • < tn-i < tn = t- The left hand side is often 
called realized volatility. 

The latter result has important implications in finance theory. Consider the 
general stochastic volatility model 

d\og{St) = fitdt + atdWt, (6.20) 

where the drift {fit} and the volatility {crt} are two adapted stochastic processes. 
In this special case, it can be shown using Ito's Lemma that (log(5))t = /J cr'^ds 
is the integrated volatility (contrast to the spot volatility at). Therefore, without 
assuming any structure on {fit} and {at} other than some regularity conditions, 
we can estimate the integrated volatility nonparametrically using the realized 
volatility 

fcT^sds^ [log(5t.^J-log(5t,)]^ (6.21) 

° O=to<ti<---<t„-i<t„=t 

In practice, one can compute daily, weekly, or monthly realized volatility based 
on high-frequency (say, 5-minute) data. Research along this line has been ini- 
tiated by Andersen and Bollserslev [12], and Barndorff-Nielsen and Shephard 
[28]. Barndorff-Nielsen and Shephard [29] further obtain the asymptotic nor- 
mality of the realized volatility. See the survey paper by Barndorff-Nielsen and 
Shephard [30]. Recent contributions include Andersen et al. [13], Mykland and 
Zhang [137], Zhang [167], and Zhang et al. [168]. The latter two papers deal 
with integrated volatility estimation for noisy high-frequency data. Zhao and 
Wu [174] study integrated volatility estimation for Levy processes. 
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7. Model validation 

As we have discussed in Section 2 and Section 3, there has never been a lack of 
parametric models. Parametric models can provide parsimonious interpretation 
of the data generating mechanism underlying the process, yet this is true only 
when the parametric models are correctly specified. For any parametric model, 
there is always a mis-specification risk that could lead to wrong conclusions. In 
fact, a correct specification for the price of the underlying asset is particularly 
important for the pricing of derivatives written on that asset. Therefore, one 
has to validate the adequacy of the parametric model before applying it to 
real data. Suppose that Q is the unknown characteristic of interest behind the 
underlying data generating mechanism. For example, in (2.7) and (3.1), we may 
take Q — {^,,a). For a given specification Qg with possible unknown parameter 
0, we want to test the null hypothesis Ha : Q = Qe- This problem is often 
termed specification testing, model validation, model checking, goodness-of-fit, 
among others. There is a huge amount of literature on model validations. In this 
section we review some popular model validation techniques. In the rest of this 
section we implicitly assume that the alternative hypothesis is that the relevant 
functions are fully nonparamctric. 

7.1. Pseudo-likelihood ratio test 

We briefly introduce the pseudo-likelihood ratio test (PLRT) in Azzalini and 
Bowman [19]. The PLRT is to compare the pseudo-likelihoods under both the 
null and the alternative. To appreciate the idea, we consider the simple regres- 
sion model Yi = ^i{Xi) + e^. Suppose that we are interested in testing the null 
hypothesis Hq : ^{x) = a + bx for some a, 6 G R. 

Under Hq, we have a simple linear regression problem. Denote by X and Y 
the design matrix and the vector of responses, respectively. Under Hq, the fitted 
values are Yq = HqY, where Hq = X(X^X)-iX^. The the residuals vector 
e = Y — HqY, and the residual sum of squares RSSq = e^e = Y-^(I — Ho)Y. 
Under the alternative hypothesis of nonparamctric setting, /z can be estimated 
by nonparamctric regression methods (6.5) or (6.11) in Section 6.2. The fitted 
values Yi = HiY, where Hi is a weighting matrix depending on X, and the 



residual sum of squares RSSi = Y'^(I - Hi)^(I - Hi)Y. Then a naive PLRT 
is given by 



Azzalini and Bowman [19] point out that the null distribution of T depends 
on the linear coefficient 6, which makes T unsuitable for hypothesis testing. 
To overcome this difficulty, they view the residuals vector e = Y — HqY = 
(I — Ho)Y as the new responses vector and apply the PLRT to e instead of Y. 
They propose the new test statistic based on e: 



T 



RSSq ^ RSSi 
RSS^ 



T 

e e 



e^(I-Hi)^(I-Hi)e 
e^e 
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A large value of T* indicates rejection of Hq. Under a normality assumption on 
the distribution of e, the null distribution of T* is related to quadratic form of 
normal variables; see the references in [19]. It is clear that one can extend the 
idea to more general model validations settings, although the test statistic for 
nonlinear parametric models may not possess an explicit form as in the linear 
regression case; see [20] and [72] on related works. 



7.2. Model validation via nonparametric curve regression 

In a stimulating paper, Hardle and Mammon [!)!j] introduce a nonparamet- 
ric curve regression based model validation procedure. Because nonparametric 
curve estimate (for example, kernel smoothing) is always a consistent estimate of 
the function of interest regardless of the underlying model, it is therefore natural 
to compare the parametric curve estimate under the null to the nonparametric 
curve estimate. Consider Yi ~ p^{Xi) + e,;. The popular nonparametric estimate 
of ^ is 

Under iJg ■ A* = M9j we apply parametric methods to obtain a consistent esti- 
mate of 9, denoted by 9. To mimic the structure of (7.1), we obtain the following 
parametric estimate 

Hardle and Mammon [99] use an L2 distance between jl and jig as the test 
statistic for Hq. See also Horowitz and Spokoiny [194]. 

An alternative approach is to compare the residuals sum squares under the 
null parametric model to that using nonparametric model. See Hong and White 
[103] along this line. 



7.3. Residuals based test 



A good statistical model would make the residuals behave like white noises. 
Therefore, a natural method of model checking is to study the residuals both 
graphically and quantitatively. 

A stationary series {eijigN is white noise if and only if its spectral density 
g is constant cr^/(27r), where is the variance of the white noise. Therefore, 
testing the white noise assumption of Si is equivalent to testing g{uj) = o'^/(27r) 
for all frequencies lu S [0, 27r]. Let /(w) be a spectral density estimate, say, 
periodogram. Fisher's test statistic is given by 

^ _ maxi<fc<„/(a;fc) 
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where ujk — 2'nk/n. A large value of T indicates rejection of the null. Under the 
null hypothesis, normalized T has an asymptotic extreme value type limiting 
distribution. See Section 7.4.1 in Fan and Yao [77]. 

Another test of white noise is based on the fact that a stationary white noise 
has zero autocorrelations for non-zero lags. Therefore, one can construct test 
statistics to measure discrepancies between sample autocorrelations and zero. 
Box and Pierce [43] and Ljung and Box [lol] use the following test statistic: 



where p{k) is the estimated autocorrelation at lag k and m is the maximum lag. 
Under the null hypothesis that {e;} is a sequence of iid random variables, both 
test statistics have asymptotic chi-square distributions. Other residuals based 
tests include Fan and Li [80], and Hong and White [103]. 

7.4- Generalized likelihood ratio test 

In an attempt to address nonparametric model validations problem. Fan et al. 
[79] developed generalized likelihood ratio test (GLRT). The basic idea behind 
their test is to compare profile likelihood under alternative to maximum like- 
lihood under the null. Let / be the function or vector of functions of interest, 
and rj the nuisance parameters. Denote by ^(/, 77) the logarithm of the likeli- 
hood for a given dataset. Suppose that we are interested in testing the null 
Hq: / = /e, 61 e 6. The GLRT works as follows: 

(a) For given 77, estimate / by fn nonparametrically. 

(b) Estimate 77 by maximizing likelihood i^fr/Tr]). 

(c) Compute the profile likelihood £{f^,T)). 

(d) Under Hq, estimate (0, 77) by the maximum likelihood estimator {0,f}o) = 
argmaxg ,^ e{fs,T])- 

(e) Compute the difference between the profile likelihood and the maximum 
likelihood. 



A significant positive value of T indicates the rejection of Hq. 

The cutoff value of T can be obtained by establishing asymptotic distribution of 
T. For example, Fan et al. [79] consider varying-coefficient model and show that 
the so-called Wilks phenomenon holds. That is, the asymptotic distribution of 
T under null hypothesis does not depend on nuisance parameters. The latter 
property can be used to determine the cutoff value via either the asymptotic 
distribution or Monte Carlo simulation for better accuracy. 

As commented by Fan and Yao [[77], pp. 406], GLRT has been developed 
for independent data, but the idea can be extended to time series data; see 





and 
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Chapter 9 in their book. In fact, one can easily apply the GLRT procedure to 
model validations problem for models in Sections 2 and 3 in conjunction with the 
nonparametric function estimation methods in Section 6.2 and the parameter 
estimation methods in Section 5. 

7.5. Simultaneous confidence band 

Let /i be a function of interest. Suppose our goal is to test the null hypothesis 
Hq : ^ = fig for a parametric form fig with unknown parameter 9. One way of 
achieving this goal is to construct simultaneous confidence band (SCB) for fj,. 
For a € (0, 1) and a pair of functions and u„(-) based on data, we say that 
£n{') and u„{-) is a SCB for /i on a bounded interval [Ti, with asymptotically 
correct nominal level (1 — a) if 

lim F{ln{x) < fi{x) < Un{x) for all x e [Ti,T2]} = 1 - a. (7.3) 

n — >oo 

A typical value of a is 5%. With the SCB, we can test the parametric hypothesis 
Hq : fi = jig based on the following procedure: 

(a) Construct (1 — a) nonparametric SCB for fi: [^„(-), m„(-)]; 

(b) Under i/g, apply parametric methods to obtain an estimate 9 of 6\ 

(c) Check whether /„(x) < fi§{x) < m„(x) holds for all x S [Ti,T2], that is, 
whether the parametric estimate fig is entirely contained with the SCB. If 
so, then we accept Hq at level a. Otherwise Hq is rejected. 

The first work on SCB construction appears in Bickel and Rosenblatt [.js] for 
nonparametric kernel density of iid data. The idea is extended to mean regression 
and time trend function SCB construction by Johnston [113], Hardle [98], Knafl 
et al. [121] and Eubank and Speckman [71] for iid data and Wu and Zhao [104] for 
non-stationary time trend with time series errors. More recently, Zhao and Wu 
[173] have successfully applied the SCB based approach for model validations 
problem to the S&P 500 index. They find that an AR(1)-ARCII(1) model is an 
adequate fit for the S&P 500 index returns. 

7.6. Density based test 

The basic idea of density based tests is to measure distance between the non- 
parametric density estimate and parametric density estimate with large distance 
indicating inadequacy of the parametric model. We can use two different densi- 
ties: marginal density and transition density. 

Consider data {XiA}o<j<Ti from model (2.7). Under Hq : {fi,a) = {fig,ae), 
the theoretical marginal density is given by fg = f^g,crg in (6.1). Model validation 
procedure based on the marginal density works as follows: 

(a) Under Hq , obtain estimate 9 of the parameter 9 using parametric methods 
in Section 5. 
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(b) 
(c) 



Obtain the parametric density estimate fe ^ f§ = fng,a-g- 
Construct nonparamctric kernel density estimate / as in (6.2). 



(d) Compute certain distance between the parametric density estimate fe and 
the nonparamctric density estimate. For example. 



Reject Hq if D„ exceeds certain level. The critical value can be obtained by 
either studying asymptotic behavior of Z?„ or through bootstrap methods. 

This idea is applied by Ai't-Sahalia [2] to daily data of 7-day Euro-dollar deposit 
rate during time period 1 June 1973 to 25 February 1995. He rejects all existing 
parametric models and proposes the new model (2.6). Clearly, this idea can be 
extended to deal with time series models by utilizing the convolution density 
estimate in Section 6.1. Recently, Zhao [170] studies model validation problem 
by constructing nonparamctric simultaneous confidence band (see also Section 
7.5) for marginal density and checking whether the implied parametric density 
estimate is entirely contained within the constructed band. Zhao [170] demon- 
strates that this density band based approach is widely applicable. For other 
contributions on density based test, see Gao and King [nn] and Hong and Li 



Marginal density only captures part of the distributional properties of stochas- 
tic processes. For a Markov chain, another natural choice is the transition den- 
sity. For model validations based on transition density, one can use the same 
procedure listed above by replacing the marginal density by the corresponding 
transition density. As in Section 6.1, one often needs to turn to the Euler ap- 
proximation scheme (5.2) or the approximation in Ai't-Sahalia [4]. Ait-Sahalia 
et al. [G] propose a transition density based test for continuous diffusion models 
and jump diffusion models. One could argue that a more cfhcicnt model vali- 
dation procedure would incorporate information from both marginal transition 
densities. This would be a future research direction. 

7. 7. Other tests 

We mention some other representative works. Anderson [1 1] introduces a goodness- 
of-fit test based on spectral density. Chen et al. [4!)] propose an empirical like- 
lihood based test which is shown to be asymptotic equivalent to the nonpara- 
mctric curve regression based test in Section 7.2. More references are collected 
in Chapters 3 and 5 of Gao [s7] where semi-parametric specifications are also 
studied. 




n 



[102]. 
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8. Tools for asymptotics 
8.1. Mixing conditions 

The Markovian property and mixing conditions for Markov chain play an im- 
portant role in large sample theory in financial econometrics. A popular mixing 
coefficient is the p-mixing coefficient. Let {Xt}t>o be a continuous-time station- 
ary Markov process. For a random variable X we write \\Z\\ = [E(Z^)]^/^ if the 
latter is finite. 

Denote by Gt and the sigma fields generated by {Xs}s<t and {Xs}s>t, 
respectively. Let be the set of square integrable random variables. Then the 
p-mixing coefficient of {Xt}t>o is defined as 

pt = sup p{gs,g'+'), 

s>0 

with 

p{g, n) ^ sup{|Corr(G, H)\ : G Q G, H Qn,G, H e C^}. 

where G{H) ^(7i) means that G{H) is measurable with respect to GiH). For 
a random variable X, denote by (j{X) the sigma field generated by X. Since 
{Xt}t>Q is a stationary Markov process, by Theorem 4.1 in Bradley (1986), 

Pt=p{a{Xo),<j{Xt))^ sup {\CoTr[g,{Xo),92{Xt)]\}. (8.1) 

gi(Xo),g2{Xt)eC^ 

Let {Jt}t>o be a family of operators defined by Jtg{x) = '&[g{Xt)\Xf) = x] 
for ||g(Xt)||2 < oo. Assuming that E[5i(Xo)] = £[32(^0)] = 0. By the Cauchy- 
Schwartz inequality, 

|Cov[5i(Xo),52(Xt)]| = |E{gi(Xo)E[52(Xt)|Xo]}| < ||5i(Xo)||||Jt52(^o)||, 
which entails by (8.1) 

\\Jt9{Xo)\\ 

Pt < sup II ■ 

E[g(Xo)]=0 \\9[Xo)\\ 

Notice that the stationarity and Markovian property of {Xt}t>o imply that 
{'Jt}t>o forms a semigroup in the sense that Jg+t = JsJt- If there exists a 
fixed to > and constant A e (0, 1) such that \\Jto9iXo)\\2 < M\9{^o)\\2 holds 
for all measurable function g satisfying E[g(Xo)] = 0, then we say that the 
operator Jto is a strong contraction. Under this condition, the family {J't}t>o of 
operators is exponentially contracting. That is, || j7t(7(Xo)|| < |jg(Xo)||20(A*^*") 
for all |j(?(Xo)|| < oo,E[g(Xo)] = and t > 0; see Banon [25]. Consequently, 
the process {Xt}t>o has an exponentially decaying p-mixing coefficient and is 
called geometric p-mixing. 

Sufficient conditions under which the operators {Jt}t>o possess the expo- 
nentially contracting property have been obtained in Banon [25], Hansen and 
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Scheinkman ['-)7]. Gcnon-Catalot et al. [!)()] to new a few. Therefore, under such 
conditions, the discrete observations {Xi}i>Q is p-mixing with exponentially 
decaying p- mixing coefficient. Polynomial mixing condition is also available in 
Veretennilov [157]. For limit theorems under various mixing conditions, see the 
papers by Bradley [44], Hannan [94], Jones [ff4], Davydov [54], Dehling et al. 
[58]. 

8.2. Physical dependence measure 

Wu [160] proposes the concept oi physical dependence measure, a very powerful 
tool in studying nonlinear time series, for example, (3.1) or (3.3) and (3.4). To 
fix the idea, consider stationary process given by 

X,^G{...,e,-i,e,), teZ, £, :iid, (8.2) 

where G is a measurable function such that Xi is well defined. Examples of (8.2) 
include the popular ARMA, fractional ARIMA linear models, the nonlinear 
NARCH model (3.1), the random iterated functions Xi = F{Xi-i, £i) for a 
random map F{-,ei) that maps Xi^i to Xi depending on the innovation Si, 
among others. 

Let (ei)i6Z be a iid copy of {ei)i^z- Define 

X- ~ G{. . . , £_i, Eq, El, ... , Si). 

Then X'^ is a coupled version of Xi with the innovation £o therein being replaced 
by the iid copy Eq. Following Wu [IGO], define the coupling coefficient: 

n 

0g{t)^\\X^^X,\\g, Qg{n)=Y,U^), 9>0, (8.3) 

4=0 

where, for a random variable Z, \\Z\\q = [E(|Z|')]^/'' if the latter is finite. By 
the construction of Xi and 9i can be viewed as the contribution of Eq in 
predicting the future value Xi. Therefore, Qq(n) can be viewed as the cumulative 
contribution of Eq in predicting the whole future sequence Xi,0 < i < n. Let 
Gq(oo) = lim„_,tx) Qqin). If 0q(oo) < oo, then we may interpret {Xi)i>o as a 
process with short-range dependence. 

Dedecker and Prieur [5{)] consider the following coupling coefficients, 

e;{i) ^ \\X: - X,\\q, where X* - G(. . . , 4, ei, . . . , £,)• (8-4) 

The difference between the two coupling versions X^ and X* of Xi is that the 
former replaces eo with e'q while the latter replaces ej with e'j for all j < 0. 
See [5()] for more details. If g > 1, by the triangle inequality, we have 6'*(i) < 
Og{i) and Oqii) < e*{i + l) + e*{i). Ue^ii) = 0{p') for some p e (0, 1), then 
the two coupling coefficients are equivalent. In many other cases, Oq(i) is often 
smaller than 0*{i). Consider, for example, the linear process Xi — X^j^i '^j^i-j- 
If E(leo|«) < oo for some q > I, then 9q{i) = 0{\a,\) and e*{i) = 0(Ej°!L, \aj\)- 
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Let 5o = and Snt ~ S^nt] + [nt — [ntj )Xp„t-| , < < 1, be the partial 
sum process of (X,)"^]^. The following result is useful in studying asymptotic 
behavior of nonlinear time series. 

(i) [Wu [l()U], Dedeckcr and Merlevede [^>'>]]. Assume 02(oo) < oo, then 

{Snt/V^,0<t<l}^{aWt,0<t<l}, (8.5) 

where a ^ \\ J2ilo\^(^>-\ ■ ■ -^^-i^^o) ~ ]E(Xj| . . . , e_2, e-i)]||2 < oo, and 
{Wt}o<t<i is a standard Brownian motion. 

(ii) [Wu [101]]. Further assume that iQqiS) < oo for some 2 < q < 4. Then 
on a possible richer probability space, there exists a Brownian motion 
{Wt}t>o such that Snt can be uniformly approximated by Wnt in the 
following sense: 

sup \S^t - aW^tl = Oa.s.[ni/^(log72)i/2+i/9(loglogn)2/9]. (8.6) 

0<t<l 

The convergence (8.5) and the approximation (8.6) have different ranges of ap- 
plicability. The approximation in (8.5) can often be used in studying asymptotic 
behavior of parametric methods, for example, maximum likelihood estimate, 
least-square estimates, M-estimation, generalized moment method. The approx- 
imation in (8.6) is particularly useful in nonparametric statistical inferences. For 
example, in nonparametric inference for time trend function, one needs to deal 
with quantities of the following form 

Vn{t)=Y,^n{i/n,t)X„ (8.7) 

i=l 

where LUn(i/n,t), I < i < n, are non-negative weights summing to one. Due to 
the dependence in {Xi)i<i<n and the non-stationarity introduced by ix>n{i/n^t), 
it is usually difficult to study Vn{t) directly. Write Ui = uJn{i/n,t). Assume that 
(8.6) holds with g = 4. By the summation by parts formula, 

n n 
Vn{t) = y^^LUijSi - Si^i) = y^^jUj^i - UJi)Si-i + UJnSn 

i=l i=2 
n 

= ^(Wj_i -UJi)<TWi-i +UJnWn + Oa.s. ("-^^"^ log n)ri„ 

4=2 

n 
i=l 

where f2„ = X!r=2 1*^'' ^ + ^n- For many nonparametric estimates with 

bandwidth &„, 17„ = 0[{nbn)~'^]- See Wu and Zhao [164]. 

Because 0q(i) is directly related to the data-generating mechanism, it often 
has tractable bound. For example, consider the linear process Xi ~ Ej^i '^j^i-ji 
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then 9q{i) ~ 0{\ai\) provided that IE(|£ol) < oo. For nonlmear models defined 
by the recursive equation 

X, = (8.8) 

for a random function R^. depending on Si. Wu and Shao [Id:)] obtain sufficient 
conditions under which: (i) (8.8) admits a unique stationary sohition of the 
form (8.2); and (ii) the function G satisfies the geometric moment contraction 
property dg{i) 0(p*). 



8.3. Martingale decomposition 

The physical dependence measure in Section (8.2) becomes more powerful in 
nonparametric inferences when it is used in conjunction with martingale decom- 
position. Here we shall illustrate the idea using nonparametric kernel density 
estimate. Let {Xi}i<i<„ be a stationary Markov chain with transition density 
function p{x\x') and invariance density f{x). Consider the popular nonparamet- 
ric kernel density estimate 

n 

f{x) = —Y,Kb^{X^-x), where Kb^u) = K{u/hn). (8.9) 
i—\ 

Let JFj be the sigma filed generated by Xj^j < i. By the Markovian property, 
E[KbAX, - x)\J^,-i] = b„ J K{u)p{x - ub.n\X,^i)du. 

Let 

n 

In{x) - ^Ma;|X,_i) -Eb(.T|X,_i)]}. 
1=1 

Then we have the decomposition 

Y^[KbSX^ -X)~ ]EA\^(Xj - X)] = Mn{x) + hn / K{u)In{x - ubn)du, 
i—1 

where 

n 

1=1 

Notice that M„{x) is a martingale with respect to and therefore various 
martingale results are applicable. Using the physical dependence measure in 
Section 8.2, one can show that, for certain short-range dependent processes, 
bn J K{u)In{x — ubn)du is of order Op{y/nbn) and negligible relative to the 
martingale part M„(x). Similar martingale decomposition techniques have been 
successfully applied to nonparametric inferences in Zhao and Wu [IT.S] and Zhao 
[170]. 
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